GPT-FAST-MIXTRAL-MOE integration #3151

alex-kharlamov · 2024-05-21T12:04:39Z

Description

Integrate gpt-fast mixtral/moe example in TorchServe.

Model inference(int8 quantized tensor paralleled Mixtral-MOE) logs provided in the attachment file.
mixtral_moe_logs.txt

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

agunapal

Awesome! Thank you for your contribution.
LGTM

GPT-FAST-MIXTRAL-MOE integration

2b380cb

agunapal added the bootcamp label May 21, 2024

agunapal self-requested a review May 21, 2024 16:22

agunapal approved these changes May 21, 2024

View reviewed changes

agunapal added 2 commits May 21, 2024 12:35

Merge branch 'master' into gpt-fast-mixtral-moe

b474469

Merge branch 'master' into gpt-fast-mixtral-moe

240b4fa

agunapal added this pull request to the merge queue May 21, 2024

Merged via the queue into pytorch:master with commit b891309 May 21, 2024
9 of 12 checks passed